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Users  Needs: 

•  Selecting  Servers 

•  Answering  Questions 

•  Organizing  Responses 


Architecture  Issues: 

•  Scalability 

•  Security 

•  Business  model  for  servers 

•  Reliable  Access 


The  Wide  Area  Information  Servers  (WAIS)  system  is  an  electronic 
publishing  system  that  helps  end  users  find  unstructured  information 
located  on  remote  machines.  It  is  composed  of  user  interfaces,  available 
for  most  machines,  and  server  software.  Started  by  Thinking  Machines, 
this  system  is  becoming  a  standard  for  information  distribution  in  the 
internet  environment.  Since  many  components  are  available  for  free, 
please  try  the  system! 

What  does  WAIS  do?  Users  on  different  platforms  can  access 
personal,  company,  and  published  information  from  one  interface.  The 
information  can  be  anything:  text,  pictures,  voice,  or  formatted 
documents.  Since  a  single  computer-to-computer  protocol  is  used, 
information  can  be  stored  anywhere  on  different  types  of  machines. 
Anyone  can  use  this  system  since  it  uses  natural  language  questions  to 
find  relevant  documents.  Relevant  documents  can  be  fed  back  to  a  server 
to  refine  the  search.  This  avoids  complicated  query  languages  and 
vendor  specific  systems.  Successful  searches  can  be  automatically  run 
to  alert  the  user  when  new  information  becomes  available. 
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How  does  WAIS  work?  The  servers  take  a  users  question  and  do  their 
best  to  find  relevant  documents.  The  servers,  at  this  point,  do  not 
"understand"  the  users  English  language  question,  rather  they  try  to  find 
documents  that  contain  those  words  and  phrases  and  ranks  then  based  on 
heuristics.  The  user  interfaces  (clients)  talk  to  the  servers  using  an 
extension  to  a  standard  protocol  Z39.50.  Using  a  public  standard  allows 
vendors  to  compete  with  each  other,  while  bypassing  the  usual  proprietary 
protocol  period  that  slows  development.  Thinking  Machines  is  giving  away 
an  implementation  of  this  standard  to  help  vendors  develop  clients  and 
servers. 

What  WAIS  servers  exist?  Even  though  the  system  is  very  new,  there 
are  already  over  100  servers  on  the  internet.  Over  5000  people  have  used 
WAIS  in  20  countries. 

•  Thinking  Machines  operates  a  Connection  Machine  on  the  internet  for 
free  use.  The  databases  it  supports  are  some  patents,  a  collection  of 
molecular  biology  abstracts,  a  cookbook,  and  the  CIA  World  Factbook. 

•  MIT  supports  a  poetry  server  with  a  great  deal  of  classical  and  modern 
poetry.  Cosmic  is  serving  descriptions  of  government  software  packages. 
The  Library  of  Congress  has  plans  to  make  their  catalog  available  on  the 
protocol. 

•  Weather  maps  and  forecasts  are  made  available  by  Thinking  Machines 
as  a  repackaging  of  existing  information. 

•  The  "directory  of  servers"  facility  is  operated  by  Thinking  Machines  so 
that  new  servers  can  be  easily  registered  as  either  for-pay  or  for-free 
servers  and  users  can  find  out  about  these  services. 

•  Dow  Jones  is  putting  a  server  on  their  own  DowVision  network.  This 
server  contains  the  Wall  Street  Journal,  Barrons,  and  450  magazines. 
This  is  a  for-pay  server. 

How  can  I  find  out  more  about  WAIS? 

•  You  can  try  a  simple  interface  by  telneting  to  quake.think.com,  login 
wais. 

•  FTP  the  free  software  from  think.com  in  the  /wais  directory. 

•  FTP  a  bibliography: 

/pub/wais/wais-discussion/bibliography. txt@quake.think.com 

•  Contact  Barbara  Lincoln  (barbara@think.com)  for  more  information,  or 
Brewster  Kahle  the  project  leader. 

•  Subscribe  to  a  biweekly  mailing  list  on  electronic  publishing  issues, 
and  new  releases;  to  subscribe  send  email  to  wais-discussion- 
request@think.com. 

Brewster  Kahle 
Project  Leader 

Wide  Area  Information  Servers 
Brewster@Think.com 
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Large  PC  Libraries 
Are  Being  Developed 

By  JOHN  MARKOFF 

The  development  of  a  nationwide 
data  network  will  allow  personal 
computer  users  to  tap  sources  as 
large  as  the  Library  of  Congress  or 
receive  their  own  personalized  elec- 
tronic newspapers. 

Several  innovations,  taken  togeth- 
er, have  already  demonstrated  that 
searching  vast  computer  data  bases 
can  be  easier  than  consulting  a  card 
catalogue,  and  not  nearly  as  difficult 
or  expensive  as  computer  searches 
are  today.  Computer  users  might 
read  some  Dickens  more  readily  than 
they  could  check  out  David  Copper- 
field  from  the  local  library. 

Those  in  the  industry  say  that  users 
with  little  computer  skills  will  soon  be 
able  to  search  through  several  tera- 
bytes of  information,  or  several  tril- 
lion characters  of  text,  in  seconds. 
The  Library  of  Congress,  with  80  mil- 
lion items,  contains  an  estimated  25 
terabytes  of  information. 


Already,  an  experimental  com- 
puter library  has  linked  150  universi- 
ties to  40  sources  of  information, 
ranging  from  National  Institutes  of 
Health  data  to  corporate  documents 
and  Shakespeare's  plays.  New  soft- 
ware allows  users  to  browse  or  zero 
in  on  particular  information. 

As  methods  of  retrieving  informa- 
tion are  standardized  and  perfected, 
industry  executives  and  computer 
scientists  say,  thousands  of  new  serv- 
ices, ranging  from  electronic  newspa- 
pers to  the  computer  equivalent  of 
free  public  libraries,  will  blossom. 
"Everyone  is  realizing  how  impor- 
tant it  is  to  get  into  the  mass  market 
for  information,"  said  Thomas  Koulo- 
poulos,  president  of  Delphi  Consulting 
Group,  a  Boston  market  research 
firm. 


Such  ready  access  to  huge  amounts 
of  computerized  information  has 
been  the  dream  of  many  in  the  indus- 
try. But  a  lack  of  computing  power, 
effective  software  and  high-speed 
digital  networks  has  stalled  progess 
until  recently. 

If  many  of  the  technical  problems 
are  being  solved,  major  business  and 
political  disputes  remain.  The  re- 
searchers acknowledge  that  they 
must  resolve  several  questions  of  pri- 
vacy and  pricing  before  they  can  put 
the  new  methods  to  commercial  use. 

Many  sources  of  information,  like 
government  documents,  might  be 
available  free,  but  other  services,  in- 
cluding electronic  newspapers,  will 
be  available  only  to  those  who  pay. 
The  industry  has  yet  to  settle  on  ways 
to  protect  and  charge  for  intellectual 
property  in  a  computer  network 
where  information  can  be  copied  in- 
stantly. But  to  encourage  progress, 
the  Thinking  Machines  Corporation,  a 
Cambridge,  Mass.,  supercomputer 
manufacturer,  has  made  its  software 
available  at  no  charge. 

Some  industry  enthusiasts  say  the 
new  technology  will  transform  the 


Brewster  Kahle  was  the  leader  of  the  development 
team  at  the  Thinking  Machines  Corporation  for  a 
nationwide  computerized  library  system.  His  team's 
software  links  a  CM2A  Connection  Machine,  left, 
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with  a  personal  computer  or  work  station  like  the 
Apple  Macintosh  II  at  right.  Using  high-speed  data 
highways,  the  two  machines  can  function  together 
although  they  may  be  thousands  of  miles  apart. 
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way  computerized  information  is 
sold.  Mitchell  Kapor,  the  founder  of 
the  Lotus  Development  Corporation, 
predicts  the  growth  of  a  new  industry 
as  significant  as  the  personal  com- 
puter business.  Some  companies,  like 
Dow  Jones  &  Company,  that  already 
provide  computerized  information 
over  telephone  lines  have  taken  part 
in  developing  the  new  computer  li- 
brary. 

The  Search  Is  Simplified 

In  1989,  Thinking  Machines  enlisted 
the  support  of  Dow  Jones,  Apple  Com- 
puter Inc.  and  the  KPMG  Peat  Mar- 
wick  accounting  and  consulting  firm 
to  design  the  computer  library,  called 
Wide  Area  Information  Servers,  or 
WAIS  (pronounced  ways).  The  sys- 
tem permits  computer  users  to 
quickly  search  through  a  huge  vol- 
ume of  information  even  if  it  is  stored 
at  several  distant  locations. 

The  system  lets  users  conduct 
searches  by  typing  common  English 
phrases  instead  of  more  complicated 
computer  commands.  While  current 
systems  like  Dialog  and  Nexis  re- 
quire users  to  specify  precisely  the 
information  they  want,  the  new  sys- 
tem can  respond  to  a  user's  infer- 
ences. It  initially  presents  a  sample 
list  of  documents.  The  user  chooses 
one  or  several,  and  then  a  "relevance 
feedback"  program  presents  other 
documents  most  like  the  ones  select- 
ed. 

"This  solves  the  problem  of  how  to 


It  will  soon  be 
possible  to  search 
through  millions 
of  items  in 
seconds. 


get  to  the  information  you  need,  get- 
ting not  too  much  and  not  too.  little," 
said  Esther  Dyson,  editor  of  Release 
1.0,  a  computer  industry  newsletter. 

This  is  a  sharp  contrast  to  the  way 
services  operate  today,  Ms.  Dyson 
said.  A  computer  user  may  need  to 
call  seven  or  eight  separate  data 
bases  depending  on  the  kind  of  infor- 
mation needed. 

The  WAIS  system  lets  users  of 
Apple  personal  computers  harness  a 
network  of  Thinking  Machines  super- 
computers and  smaller  "server" 
computers  to  search  data  bases 
stored  by  Dow  Jones,  KPMG  and  sev- 
eral corporations  and  universities. 
Users  can  also  read  electronic  mail, 
enter  their  corporate  electronic  li- 
braries and  summon  up  a  wide  vari- 
ety of  documents,  newspapers  and 
magazines. 

A  'Corporate  Memory' 

At  Thinking  Machines,  the  WAIS 
system  serves  as  a  "corporate  mem- 
ory," allowing  employees  to  retrieve 
memos,  documents  and  other  inter- 


nal information.  Employees  who  may 
not  be  working  together  can  shai  e  ex- 
pertise. 

"If  someone  did  something  in  Los 
Angeles  and  I'm  sitting  in  San  Fran- 
cisco, I  may  not  know  about  the 
work,"  said  Robin  Palmer,  a  senior 
manager  at  Peat  Marwick. 

WAIS  delivers  information  over  the 
Internet,  a  collection  of  2,600  high- 
speed public  and  private  computer 
■  networks.  This  Government-spon- ' 
sored  system  of  data  highways  is  rap- 
idly being,  improved  and. turned  to 
commercial  uses. 

The  market  for  software  that-  al- 
lows the  rapid  retrieval  of  computer- 
ized text  is  small  but  growing,  ac- 
cording to  industry  analysts.  In  1989, 
the  United  Stales  had  fewer  than 
60,000  users;  by  the  next  year,  total 
sales  were  about  $120  million.  The 
Delphi  Consulting  Group  expects  the 
market  to  grow  to  160,000  users  and 
$235  million  by  1992. 

"Information  retrieval  technology 
is  starting  to  spread  from  supercom- 
puters all  the  way  down  to  personal 
computers,"  said  Brewster  Kahle,  a 
Thinking  Machines  scientist  who  has 
led  the  WAIS  experiment. 

The  WAIS  system  is  built  on  a 
procedure  for  retrieving  information 
developed  by  librarians  who  initially 
set  out  to  computerize  their  card 
catalogues.  The  procedure  —  known 
in  the  field  as  Z39.50  —  now  has  the 
support  of  the  Library  of  Congress, 
Apple,  Sun  Microsylems  Inc.,  Next 
Inc.,  Dow  Jones  and  Mead  Data  Cen- 
tral. 

In  the  future,  a  special  directory  or' 


Spreading  Information 

The  Wide  Area  Information  Servers  system  provides  a  broad  range  of  information  by 
linking  users  to  many  independent  sources  The  information  can  be  in  the  form  of  sound, 
words  or  pictures. 
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"white  pages"  will  keep  an  up-to-date 
list  of  all  the  separate  sources  on  the 
network. 

Apple  has  its'  own  electronic  library 
project,  borrowing  its  name,  Rose- 
bud, from  the  movie  "Citizen  Kane." 
The  three-year-old  project  is  based  on 
the  WAIS  system,  but  adds  features 
including  the  ability  for  a  user  to  de- 
velup  a  personalized  electronic  news- 
paper. 

Rosebud  uses  special  programs  — 


called  "reporters"  —  that  let  custom- 
ers specify  the  kinds  of  information 
and  news  they  want  to  retrieve  from 
the  WAIS  system  every  day.  Re- 
searchers at  Apple's  Advanced  Tech- 
nology Group  said  that  in  the  future 
the  necessary  retrieval  software 
might  be  a  standard  part  of  a  comput- 
er's operating  system. 

They  expect  improvements  in  the 
Internet  computer  network  to  greatly 
lower    the    cost    of  information 


searches,  promoting  the  introduction 
of  many  new  services.  The  Govern- 
ment proposes  to  expand  and  im- 
prove Internet  by  financing  a  Na- 
tional Research  and  Education  Net- 
work, or  NREN,  that  could  extend  a 
high-speed  computer  links  into 
schools  and  communities  across  the 
country. 

"With  things  like  NREN,  everlhing 
could  change  overnight,"  said  Tim 
Oren,  an  Apple  researcher. 


